Two characteristics are desired in a cluster: the variance of the data points assigned to it have low variance; the distance to other clusters is big. I'll call the first intra-cluster index and the second inter-cluster index.
The intra-cluster index is essentially a measure of the variance of the data assigned to a cluster. The lower this value is the better score we have.As such, the intra-cluster index of cluster $i$ is given by
$$S_i = \frac{1}{T_i} \sum_{j=1}^{T_i} \left \| X_j - A_i \right \|_p$$where $T_i$ is the number of data points assigned to cluster $i$, $X_j$ are the data points assigned to cluster $i$, $A_i$ is the centroid of the cluster $i$, and $p$ is the type of norm used.
The inter-cluster index is a measure of how scattered are the clusters relative to each other. This is index is always measured between two clusters. The value of this index $M_{ij}$ between cluter $i$ and $j$ is:
$$M_{ij} = \left \| A_i - A_j \right \|_p$$where $A_i$ and $A_j$ are the centroids of the $i$ and $j$ clusters, respectevely, and p is the type of norm being used.
The final index is a combination of both inter- and intra-cluster scores, for all clusters. The total score between any two clusters is:
$$R_{ij}=\frac{S_i + S_j}{M_{ij}}$$The lower both intra-cluster scores are and the higher the inter-cluster score is, the smaller the total score will be. A low total score between two clusters represents two good clusters.
However, we still have to combine the scores from all the clusters. We do this by computing $R_{ij}$ between all clusters and selecting the highest for each. It's a pessimistic approach, since we're selecting the worse score associated with each cluster. As such, the worst score for each cluster $i$ is:
$$\max_{j:i \neq j} R_{ij}$$The final David-Bouldin score is just the average of the worst case scenario scores:
$$DB=\frac{1}{K} \sum_{i=1}^{K} D_i$$where $K$ is the total number of clusters. The lower DB is, the better evaluated are the clusters.
In [ ]: